Sequencing of surface immobilized polymers utilizing microfluorescence detection

ABSTRACT

Means for simultaneous parallel sequence analysis of a large number of biological polymer macromolecules. Apparatus and methods may use fluorescent labels in repetitive chemistry to determine terminal monomers on solid phase immobilized polymers. Reagents which specifically recognize terminal monomers are used to label polymers at defined positions on a solid substrate.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.11/325,809, filed Jan. 5, 2006, which is a continuation of U.S.application Ser. No. 10/996,692, filed Nov. 23, 2004, which is acontinuation of U.S. application Ser. No. 10/077,070, filed Feb. 14,2002, which is a continuation of U.S. application Ser. No. 08/829,893,filed Apr. 2, 1997, now abandoned, which is a continuation of U.S.application Ser. No. 08/679,478, filed Jul. 12, 1996, now U.S. Pat. No.5,902,723, which is a continuation of U.S. application Ser. No.07/626,730, filed Dec. 6, 1990, now U.S. Pat. No. 5,547,839, the entiredisclosures of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

The present invention relates to the determination of the sequences ofpolymers immobilized to a substrate. In particular, one embodiment ofthe invention provides a method and apparatus for sequencing manynucleic acid sequences immobilized at distinct locations on a matrixsurface. The principles and apparatus of the present invention may beused, for example, also in the determination of sequences of peptides,polypeptides, oligonucleotides, nucleic acids, oligosaccharides,phospholipids and other biological polymers. It is especially useful fordetermining the sequences of nucleic acids and proteins.

The structure and function of biological molecules are closelyinterrelated. The structure of a biological polymer, typically amacromolecule, is generally determined by its monomer sequence. For thisreason, biochemists historically have been interested in the sequencecharacterization of biological macromolecule polymers. With the adventof molecular biology, the relationship between a protein sequence andits corresponding encoding gene sequence is well understood. Thus,characterization of the sequence of a nucleic acid encoding a proteinhas become very important.

Partly for this reason, the development of technologies providing thecapability for sequencing enormous amounts of DNA has received greatinterest. Technologies for this capability are necessary for, forexample, the successful completion of the human genome sequencingproject. Structural characterization of biopolymers is very importantfor further progress in many areas of molecular and cell biology.

While sequencing of macromolecules has become extremely important, manyaspects of these technologies have not advanced significantly over thepast decade. For example, in the protein sequencing technologies beingapplied today the Edman degradation methods are still being used. See,e.g., Knight (1989) “Microsequencers for Proteins and Oligosaccharides,”Bio/Technol. 7:1075 1076. Although advanced instrumentation for proteinsequencing has been developed, see, e.g., Frank et al. (1989)“Automation of DNA Sequencing Reactions and Related Techniques: A WorkStation for Micromanipulation of Liquids,” Bio/Technol. 6:1211-1213,this technology utilizes a homogeneous and isolated protein sample fordetermination of removed residues from that homogeneous sample.

Likewise, in nucleic acid sequencing technology, three major methods forsequencing have been developed, of which two are commonly used today.See, e.g., Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual(2d Ed.) Vols. 1 3, Cold Spring Harbor Press, New York, which is herebyincorporated herein by reference. The first method was developed byMaxam and Gilbert. See, e.g., Maxam and Gilbert (1980) “SequencingEnd-Labeled DNA with Base-Specific Chemical Cleavages,” Methods inEnzymol. 65:499-560, which is hereby incorporated herein by reference.The polymer is chemically cleaved with a series of base-specificcleavage reagents thereby generating a series of fragments of variouslengths. The various fragments, each resulting from a cleavage at aspecific base, are run in parallel on a slab gel which resolves nucleicacids which differ in length by single nucleotides. A specific labelallows detection of cleavages at all nucleotides relative to theposition of the label.

This separation requires high resolution electrophoresis or some othersystem for separating nucleic acids of very similar size. Thus, thetarget nucleic acid to be sequenced must usually be initially purifiedto near homogeneity.

Saner and Coulson devised two alternative methods for nucleic acidsequencing. The first method, known as the plus and minus method, isdescribed in Sanger and Coulson (1975) J. Mol. Biol. 94:441-448, and hasbeen replaced by the second method. Subsequently, Sanger and Coulsondeveloped another improved sequencing method known as the dideoxy chaintermination method. See, e.g., Sanger et al. (1977) “DNA Sequencing withChain-Termination Inhibitors,” Proc. Natl. Acad. Sci. USA 74:5463-5467,which is hereby incorporated herein by reference. This method is basedon the inability of 2′, 3′ dideoxy nucleotides to be elongated by apolymerase because of the absence of a 3′ hydroxyl group on the sugarring, thus resulting in chain termination. Each of the separate chainterminating nucleotides are incorporated by a DNA polymerase, and theresulting terminated fragment is known to end with the correspondingdideoxy nucleotide. However, both of the Sanger and Coulson sequencingtechniques usually require isolation and purification of the nucleicacid to be sequenced and separation of nucleic acid molecules differingin length by single nucleotides.

Both the polypeptide sequencing technology and the oligonucleotidesequencing technologies described above suffer from the requirement toisolate and work with distinct homogeneous molecules in eachdetermination.

In the polypeptide technology, the terminal amino acid is sequentiallyremoved and analyzed. However, the analysis is dependent upon only onesingle amino acid being removed, thus requiring the polypeptide to behomogeneous.

In the case of nucleic acid sequencing, the present techniques typicallyutilize very high resolution polyacrylamide gel electrophoresis. Thishigh resolution separation uses both highly toxic acrylamide for theseparation of the resulting molecules and usually very high voltages inrunning the electrophoresis. Both the purification and isolationtechniques are highly tedious, time consuming and expensive processes.

Thus, a need exists for the capability of simultaneously sequencing manybiological polymers without individual isolation and purification.Moreover, dispensing with the need to individually perform the highresolution separation of related molecules leads to greater safety,speed, and reliability. The present invention solves these and manyother problems.

BRIEF SUMMARY OF THE INVENTION

The present invention provides the means to sequence hundreds, thousandsor even millions of biological macromolecules simultaneously and withoutindividually isolating each macromolecule to be sequenced. It alsodispenses with the requirement, in the case of nucleic acids, ofseparating the products of the sequencing reactions on dangerouspolyacrylamide gels. Adaptable to automation, the cost and effortrequired in sequence analysis will be dramatically reduced.

This invention is most applicable, but not limited, to linearmacromolecules. It also provides specific reagents for sequencing botholigonucleotides and polypeptides. It provides an apparatus forautomating the processes described herein.

The present invention provides methods for determining the positions ofpolymers which terminate with a given monomer, where said polymers areattached to a surface having a plurality of positionally distinctpolymers attached thereto, said method comprising the steps of:

-   -   labeling a terminal monomer in a monomer type specific manner;        and    -   scanning said surface, thereby determining the positions of said        label.        In one embodiment, the polymers are polynucleotides, and usually        the labeling of the terminal marker comprises incorporation of a        labeled terminal monomer selected from the group of nucleotides        consisting of adenine, cytidine, guanidine and thymidine.

An alternative embodiment provides methods for concurrently determiningwhich subset of a plurality of positionally distinct polymers attachedto a solid substrate at separable locations terminates with a giventerminal subunit, said method comprising the steps of:

-   -   mixing said solid substrate with a solution comprising a        reagent, which selectively marks positionally distinct polymers        which terminate with said given terminal subunit; and    -   determining with a detector which separable locations are        marked, thereby determining which subset of said positionally        distinct polymers terminated with said given terminal subunit.        In one version, the solution comprises a reagent which marks the        positionally distinct polymer with a fluorescent label moiety.        In another version the terminal subunit is selected from the        group consisting of adenosine, cytosine, guanosine, and thymine.

Methods are also provided for determining which subset of a plurality ofprimer polynucleotides have a predetermined oligonucleotide, wherein thepolynucleotides are complementary to distinctly positioned templatestrands which are attached to a solid substrate, said method comprisingthe steps of:

-   -   selectively marking said subset of primer polynucleotides having        the predetermined oligonucleotide; and    -   detecting which polynucleotides are marked.        In one embodiment, the oligonucleotide subunit is a single        nucleotide; in another the marking comprises elongating said        primer with a labeled nucleotide which is complementary to a        template; and in a further embodiment the marking step uses a        polymerase and a blocked and labeled adenine.

The invention embraces methods for concurrently obtaining sequenceinformation on a plurality of polynucleotides by use of a single labeldetector, said method comprising the steps of:

-   -   attaching a plurality of positionally distinct polynucleotides        to a solid substrate at separable locations;    -   labeling said plurality of polynucleotides with a terminal        nucleotide specific reagent, said label being detectable using        said label detector; and    -   determining whether said specific labeling reagent has labeled        each separable location.        Often, the labeling is performed with reagents which can        distinguishably label alternative possible nucleotide monomers.        One embodiment uses four replica substrates each of which is        labeled with a specific labeling reagent for adenine, cytosine,        guanine, or thymine. Usually, the labeling and determining steps        are performed in succession using reagents specific for each of        adenine, cytosine, guanine, and thymine monomers.

An alternative embodiment provides methods for concurrently obtainingsequence information on a plurality of polynucleotides, said methodcomprising the steps of:

-   -   attaching distinct polynucleotides to a plurality of distinct        solid substrates;    -   labeling said plurality of solid substrates with a terminal        nucleotide specific labeling reagent; and    -   determining whether said specific labeling reagent has labeled        each distinct substrate.        The method can be performed using a continuous flow of distinct        solid substrates through a reaction solution.

A method is provided for simultaneously sequencing a plurality ofpolymers made up of monomer units, said plurality of polymers attachedto a substrate at definable positions, said method comprising the stepsof:

-   -   mixing said substrate with a reagent which specifically        recognizes a terminal monomer, thereby providing identification        among various terminal monomer units;    -   scanning said substrate to distinguish signals at definable        positions on said substrate; and    -   correlating said signals at defined positions on said substrate        to provide sequential series of sequence determinations.        Often, the plurality of polymers are synthesized by a plurality        of separate cell colonies, and the polymers may be attached to        said substrate by a carbonyl linkage. In one embodiment, the        polymers are polynucleotides, and often the substrate comprises        silicon. The scanning will often identify a fluorescent label.        In one embodiment, the reagent exhibits specificity of removal        of terminal monomers, in another, the reagent exhibits        specificity of labeling of terminal monomers.

The invention also embraces methods for sequencing a plurality ofdistinctly positioned polynucleotides attached to a solid substratecomprising the steps of:

-   -   hybridizing complementary primers to said plurality of        polynucleotides;    -   elongating a complementary primer hybridized to a polynucleotide        by adding a single nucleotide; and    -   identifying which of said complementary primers have        incorporated said nucleotide. In some versions, the elongating        step is performed simultaneously on said plurality of        polynucleotides linked to said substrate.        Typically, the substrate is a two dimensional surface and the        identifying results from a positional determination of the        complementary primers incorporating the single defined        nucleotide. A silicon substrate is useful in this method.

Methods, are provided where the linking is by photocrosslinkingpolynucleotide to said complementary primer, where said primer isattached to said substrate. The elongating will be often catalyzed by aDNA dependent polymerase. In various embodiments, a nucleotide will havea removable blocking moiety to prevent further elongation, e.g., NVOC.

A nucleotide with both a blocking moiety and labeling moiety will beoften used.

A further understanding of the nature and advantages of the inventionherein may be realized by reference to the remaining portions of thespecification and the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a simplified and schematized embodiment of adegradative scheme for polymer sequencing.

FIG. 2 illustrates a simplified and schematized embodiment of asynthetic scheme for polymer sequencing.

FIG. 3 illustrates a coordinate mapping system of a petri platecontaining colonies. Each position of a colony can be assigned adistinct coordinate position.

FIG. 4 illustrates various modified embodiments of the substrates.

FIG. 5 illustrates an idealized scanning result corresponding to aparticular colony position.

FIG. 6 illustrates particular linkers useful for attaching a nucleicacid to a silicon substrate. Note that thymine may be substituted byadenine, cytidine, guanine, or uracil.

FIG. 7 illustrates an embodiment of the scanning system and reactionchamber.

FIG. 8 illustrates the application of the synthetic scheme forsequencing as applied to a nucleic acid cluster localized to a discreteidentified position. FIG. 8A illustrates schematically, at a molecularlevel, the sequence of events which occur during a particular sequencingcycle. FIG. 8B illustrates, in a logic flow chart, how the scheme isperformed.

FIG. 9 illustrates the synthesis of a representative nucleotide analoguseful in the synthetic scheme. Note that the FMOC may be attached toadenine, cystosine, or guanine.

FIG. 10 illustrates the application of the degradative scheme forsequencing as applied to a nucleic acid cluster localized to a discreteidentified position. FIG. 10A illustrates schematically, at a molecularlevel, the sequence of events which occur during a particular sequencingcycle. FIG. 10B illustrates in a logic flow chart how the scheme isperformed.

FIG. 11 illustrates a functionalized apparatus for performing thescanning steps and sequencing reaction steps.

DETAILED DESCRIPTION OF THE INVENTION I. Sequencing Procedure for aGeneric Polymer

A. Overview

-   -   1. Substrate and matrix    -   2. Scanning system    -   3. Synthetic/degradative cycles    -   4. Label    -   5. Utility

B. Substrate/Matrix

-   -   1. Non-distortable    -   2. Attachment of polymer

C. Scanning system

-   -   1. Mapping to distinct position    -   2. Detection system    -   3. Digital or analog signal

D. Synthetic or degradative cycle

-   -   1. Synthetic cycles        -   a. synthetic scheme        -   b. blocking groups    -   2. Degradative cycles    -   3. Conceptual principles

E. Label

-   -   1. Attachment    -   2. Mode of detection

F. Utility

II. Specific Embodiments

A. Synthetic method

B. Chain degradation method

III. Apparatus I. SEQUENCING PROCEDURE FOR A GENERIC POLYMER

The present invention provides methods and apparatus for the preparationand use of a substrate having a plurality of polymers with varioussequences where each small defined contiguous area defines a smallcluster of homogeneous polymer sequences. The invention is describedherein primarily with regard to the sequencing of nucleic acids but maybe readily adapted to the sequencing of other polymers, typically linearbiological macromolecules. Such polymers include, for example, bothlinear and cyclical polymers or nucleic acids, polysaccharides,phospholipids, and peptides having various different amino acids,heteropolymers in which the polymers are mixed, polyurethanes,polyesters, polycarbonates, polyureas, polyamides, polyethyleneimines,polyarylene sulfides, polysiloxanes, polyimides, polyacetates or mixedpolymers of various sorts. In a preferred embodiment, the presentinvention is described in the use of sequencing nucleic acids.

Various aspects of U.S. Ser. No. 07/362,901 (VLSIPS parent); U.S. Ser.No. 07/492,462, now U.S. Pat. No. 5,143,854 (VLSIPS CIP); U.S. Ser. No.07/435,316 (caged biotin parent); U.S. Ser. No. 07/612,671 (caged biotinCIP); and simultaneously filed cases U.S. Ser. No. 07/624,114(sequencing by hybridization) and U.S. Ser. No. 07/624,120, a divisionalof which has issued as U.S. Pat. No. 5,744,305 (automated VLSIPS), eachof which is incorporated herein by reference, are applicable to thesubstrates and matrix materials described herein, to the apparatus usedfor scanning the matrix arrays, to means for automating the scanningprocess, and to the linkage of polymers to a substrate.

By use of masking technology and photosensitive synthetic subunits, theVLSIPS apparatus allows for the stepwise synthesis of polymers accordingto a positionally defined matrix pattern. Each oligonucleotide probewill be synthesized at known and defined positional locations on thesubstrate. This forms a matrix pattern of known relationship betweenposition and specificity of interaction. The VLSIPS technology allowsthe production of a very large number Of different oligonucleotideprobes to be simultaneously and automatically synthesized includingnumbers in excess of about 10², 10³, 10⁴, 10⁵, 10⁶, or even more, and atdensities of at least about 10², 10³/cm², 10⁴/cm², 10⁵/cm² and up to10⁶/cm² or more. This application discloses methods for synthesizingpolymers on a silicon or other suitably derivatized substrate, methodsand chemistry for synthesizing specific types of biological polymers onthose substrates, apparatus for scanning and detecting whetherinteraction has occurred at specific locations on the substrate, andvarious other technologies related to the use of a high density verylarge scale immobilized polymer substrate. At a size of about 30 micronsby 30 microns, one million regions would take about 11 centimeterssquare or a single wafer of about 4 centimeters by 4 centimeters. Thusthe present technology provides for making a single matrix of that sizehaving all one million plus possible oligonucleotides. Region size issufficiently small to correspond to densities of at least about 5regions/cm², 20 regions/cm², 50 regions/cm², 100 regions/cm², andgreater, including 300 regions/cm², 1000 regions/cm², 3K regions/cm²,10K regions/cm², 30K regions/cm², 100K regions/cm², 300K regions/cm² ormore, even in excess of one million regions/cm².

A. Overview

The present invention is based, in part, on the ability to perform astep wise series of reactions which either extend or degrade a polymerby defined units.

FIG. 1 schematizes a simplified linear two monomer polymer made up of Atype and B type subunits. A degradative scheme is illustrated. Panel Adepicts a matrix with two different polymers located at positions 10 and14, but with no polymer linked at position 12. A reaction is employed tolabel all of these polymers at the terminus opposite the attachment ofthe monomer. Panel B illustrates a label (designated by an asterisk)incorporated at position 16 on the terminal monomers. A scan step isperformed to locate positions 10 and 14 where polymers have been linked,but no polymer is located at position 12. The entire matrix is exposedto a reagent which is specific for removing single terminal A monomers,which are also labeled. The reagent is selected to remove only a singlemonomer; it will not remove further A monomers. Removal of the labeled Amonomer leaves a substrate as illustrated in panel C. A scan step isperformed and compared with the previous scan, indicating that thepolymer located at position 12 has lost its label, i.e., that polymer at12 terminated with an A monomer. The entire matrix is then exposed to asecond reagent which is specific for removing terminal B monomers whichare also labeled. Note that only a single B on each monomer is removedand that successive B monomers are not affected. Removal of the labeledB monomer leaves a substrate as illustrated in panel D. Another scanstep is performed, indicating that the polymer located at position 14has lost its label, i.e., it terminated with a B monomer. The sequenceof treatments and scans is repeated to determine the successivemonomers. It will be recognized that if the labeled A and B aredistinguishable, i.e., the label on polymers at sites 10 and 14 may bedistinguished, a single removal step can be performed to convert thesubstrate as illustrated in panel B directly to that illustrated inpanel D.

An alternative embodiment employs synthetic reactions where a syntheticproduct is made at the direction of the attached polymer. The method isuseful in the synthesis of a complementary nucleic acid strand byelongation of a primer as directed by the attached polymer.

FIG. 2 illustrates a similar simplified polymer scheme, where the A andB monomer provide a complementary correspondence to A′ and B′respectively. Thus, an A monomer directs synthetic addition of an A′monomer and a B monomer directs synthetic addition of a B′ monomer.Panel A depicts monomers attached at locations 18 and 22, but not atlocation 20. Each polymer already has one corresponding complementarymonomer A′. The matrix, with polymers, is subjected to an elongationreaction which incorporates, e.g., single labeled A′ monomers 24 but notB′ monomers, as depicted in panel B. The label is indicated by theasterisk. Note that only one A monomer is added. A scan step isperformed to determine whether polymers located at positions 18 or 22have incorporated the labeled A′ monomers. The polymer at position 18has, while the polymer at position 22 has not. Another elongationreaction which incorporates labeled B′ monomers 26 is performedresulting in a matrix as depicted in panel C. Again note that only one,and not successive B′ monomers, is added. Another scan is performed todetermine whether a polymer located at sites 18 or 22 has incorporated alabeled B′ monomer, and the result indicates that the polymer located atsite 22 has incorporated the labeled B′ monomer. A next step removes allof the labels to provide a substrate as depicted in panel D. As before,if the polymer which incorporated a labeled A′ monomer isdistinguishable from a polymer which incorporated a labeled B′ monomer,the separate elongation reactions may be combined producing a panel Ctype matrix directly from a panel A type matrix and the scan procedurecan distinguish which terminal monomer was incorporated.

It will be appreciated that the process may be applied to morecomplicated polymers having more different types of monomers. Also, thenumber of scan steps can be minimized if the various possible labeledmonomers can be differentiated by the detector system.

Typically, the units will be single monomers, though under certaincircumstances the units may comprise dimers, trimers, or longer segmentsof defined length. In fact, under certain circumstances, the method maybe operable in removing or adding different sized units so long as theunits are distinguishable. However, it is very important that thereagents used do not remove or add successive monomers. This is achievedin the degradative method by use of highly specific reagents. In thesynthetic mode, this is often achieved with removable blocking groupswhich prevent further elongation.

One important aspect of the invention is the concept of using asubstrate having homogeneous clusters of polymers attached at distinctmatrix positions. The term “cluster” refers to a localized group ofsubstantially homogeneous polymers which are positionally defined ascorresponding to a single sequence. For example, a coordinate systemwill allow the reproducible identification and correlation of datacorresponding to distinct homogeneous clusters of polymers locallyattached to a matrix surface. FIG. 3 illustrates a mapping systemproviding such a correspondence, where transfer of polymers produced bya colony of organisms to a matrix preserves spatial information therebyallowing positional identification.

In one embodiment, bacterial colonies producing polymers are spatiallyseparated on the media surface of a petri plate as depicted in panel A.Alternatively, phage plaques on a bacterial lawn can exhibit a similardistribution. A portion of panel A is enlarged and shown in panel B.Individual colonies are labeled C1-C7. The position of each colony canbe mapped to positions on a coordinate system, as depicted in panel C.The positions of each colony can then be defined, as in a table shown inpanel D, which allows reproducible correlation of scan cycle results.

Although the preferred embodiments are described with respect to a flatmatrix, the invention may also be applied using the means forcorrelating detection results from multiple samples after passagethrough batch or continuous flow reactions. For example, spatiallyseparated polymers may be held in separate wells on a microtiter plate.The polymers will be attached to a substrate to retain the polymers asthe sequencing reagents are applied and removed.

The entire substrate surface, with homogeneous clusters of polymersattached at defined positions, may be subjected to batch reactions sothe entire surface is exposed to a uniform and defined sequence ofreactions. As a result, each cluster of target polymers for sequencingwill be subjected to similar reactive chemistry. By monitoring theresults of these reactions on each cluster localized to a definedcoordinate position, the sequence of the polymer which is attached atthat site will be determined.

FIG. 4, panel A illustrates solid phase attached polymers linked toparticles 32 which are individually sequestered in separate wells 34 ona microtiter plate. The scanning system will separately scan each well.FIG. 4 panel B illustrates marbles 36 to which polymers are attached.The marbles are automatically fed in a continuous stream through thereaction reagents 38 and past a detector 40. The marbles may becarefully held in tubes or troughs which prevent the order of the beadsfrom being disturbed. In a combination of the two embodiments, eachpolymer is attached to a plurality of small marbles, and marbles havingeach polymer are separated, but retained in a known order. Each marbleis, in batch with a number of analogous marbles having other polymerslinked individually to them, passed through a series of reagents in thesequencing system. For example, A2, B2, and C2 are subjected tosequencing reactions in batch, with label incorporated only for thesecond monomer. A3, B3, and C3 are likewise treated to determine thethird monomer. Likewise for A_(n), B_(n), and C_(n). However, withineach batch, the detection will usually occur in the order A, B, and C,thereby providing for correlation of successive detection steps for theA polymer beads, for the B polymer beads, and for the C polymer beads.

FIG. 5 illustrates a signal which might result from a particular definedposition. Panel A illustrates the position of a given colony relative tothe positions corresponding to the positional map. The scan system willtypically determine the amount of signal, or type of signal, at eachposition of the matrix. The scan system will adjust the relationship ofthe detector and the substrate to scan the matrix in a controllablefashion. An optical system with mirrors or other elements may allow therelative positions of the substrate and detection to be fixed. Thescanner can be programmed to scan the entire substrate surface in areproducible manner, or to scan only those positions where polymerclusters have been localized. A digital data map, panel B, can begenerated from the scan step.

Thus, instead of subjecting each individual and separated polymer to theseries of reactions as a homogeneous sample, a whole matrix array ofdifferent polymers targeted for sequencing may be exposed to a series ofchemical manipulations in a batch format. A large array of hundreds,thousands, or even millions of spatially separated homogeneous regionsmay be simultaneously treated by defined sequencing chemistry.

The use of a coordinate system which can reproducibly assay a definedposition after each reaction cycle can be advantageously appliedaccording to this invention. For example, a colony plaque lift ofpolymers can be transferred onto a nitrocellulose filter or othersubstrate. A scanning detector system will be able to reproduciblymonitor the results of chemical reactions performed on the targetpolymers located at the defined locations of particular clones. Anaccurate positioning can be further ensured by incorporating variousalignment marks on the substrate.

The use of a high resolution system for monitoring the results ofsuccessive sequencing steps provides the possibility for correlating thescan results of each successive sequencing reaction at each definedposition.

The invention is dependent, in part, upon the stepwise synthesis ordegradation of the localized polymers as schematized in FIGS. 1 and 2.The synthetic scheme is particularly useful on nucleic acids which canbe synthesized from a complementary strand. Otherwise, a stepwisedegradation scheme may be the preferred method. Although single monomercycles of synthesis or degradation will usually be applicable, incertain cases the technology will be workable using larger segments,e.g., dimers or trimers, in the cyclic reactions.

The present invention also provides methods for production or selectionof monomer-specific degradative reagents based upon catalytic antibodyconstructs. Antibody binding sites exhibiting specificity for bindingparticular terminal monomers can be linked to cleavage reagents oractive sites of cleavage enzymes. Thus, reagents which are specific forparticular terminal nucleotides may function to remove them in aspecific fashion.

The invention also makes use of a means for detecting or labeling thepolymers. Particular sequencing chemistry can be selected forspecificity in reacting with terminal monomer units. Alternatively,indirect labeling methods may be applied which can distinguish betweendifferent terminal monomers. Another alternative scheme allows forterminal labeling which is not monomer-specific, but with thedetermination of the monomer based upon specificity of post-labelreagents or upon monomer-distinguishable labels. Suitable such reagentswill be antibodies or other reagents having specificity fordistinguishing between different labeled terminal monomer residues andcleaving only those labeled monomer residues.

Thus, although neither the reaction nor the label need necessarily bespecific, at least one of the pair must be specific. A comparison oflabel signal before and after a reaction allows determination of thechange in label signal after monomer specific reactions are performed,and thereby provides the means to deduce the identity of the monomer ata given position.

B. Substrate/Matrix

The substrate or matrix has relatively few constraints on itscomposition. Preferably, the matrix will be inert to the sequencingreactions to which the polymers attached thereto will be subjected.Typically, a silicon or glass substrate will be used, but other suitablematrix materials include ceramics, or plastics, e.g., polycarbonate,polystyrene, delrin, and cellulose, and any other matrix which satisfiesthese functional constraints.

In one embodiment, the matrix should be sufficiently nondeformable thatthe scanning system can reproducibly scan the matrix and reliablycorrelate defined positions with earlier and later scan operations.However, by including alignment markings on the substrate, the need forabsolute rigidity of the substrate may be reduced.

In an alternative embodiment, the matrix may merely be large enough thatattached polymer may be separated from a liquid phase containing thesequencing reagents. In this embodiment, a single detection unit is usedto analyze the label in a multiplicity of different samples after eachof the reaction steps. Thus, different samples may be separably treatedin distinct wells of a microtiter dish.

Separate homogeneous polymers can be introduced to solid phase beads ineach microtiter well. Sequencing reagents may be individually introducedseparately into each well, or transferred from well to well with thepolymers remaining in the correct well due to their solid phaseattachments.

In an alternative approach, the solid phase matrix may be marbles orother particularly shaped articles. Spherical shapes, solid or hollow,are preferred because they can be easily transported through troughs ortubing which retains their relative orders. By feeding a succession ofbeads though appropriate reaction baths and past a detector in a knownand retained order, a succession of label detection results from a beadmay be correlated and converted into a polymer sequence.

The attachment of the target homogeneous clusters of target polymers tothe substrate can be achieved by appropriate linkage chemistry. Asindicated before, the linkage should be stable and insensitive to thesequencing reagents used. The specific linkages will depend, of course,upon the particular combination of substrate and polymer being used.

Typically, the most useful chemical moieties which will be used areamines. Typical substrate derivatized groups include aminopropyltriethoxysilane, hydroxypropylacylate, or hydroxy reagents, see, e.g.,U.S. Ser. No. 07/624,120 (automated VLSIPS). Typical polymer derivatizedgroups include nitroveratryl and nitroveratryl oxycarbonyl. Linkagetypes are also illustrated and detailed in U.S. Ser. No. 07/624,120(automated VLSIPS) and U.S. Ser. No. 07/624,114 (sequencing byhybridization).

FIG. 6 illustrates one preferred linkage chemistry for nucleic acids. AnNVO-derivatized nucleotide is made as described in U.S. Ser. No.07/624,120 (automated VLSIPS). The specific conditions for synthesis ofthymidine are described therein and are adaptable to other nucleotidesand nucleosides. The nucleoside analog is further derivatized with anappropriate R group at the 3′ hydroxyl. Preferred R groups are indicatedin FIG. 6. The linkage produces a photosensitive blocked nucleosidesuitable for phosphoramidite synthesis of further polynucleotides whichcan serve as a complementary strand for hybridization of other polymers.The hybrids of the complementary strands may be covalently crosslinkedusing acridine dyes or other intercalative reagents, e.g., psoralen.See, e.g., Komberg (1980) DNA Replication Freeman, San Francisco;Wiesehahn, et al. (1978) Proc. Natl. Acad. Sci. USA 75:2703-2707, andSheldon (1986) U.S. Pat. No. 4,582,789 which are each incorporatedherein by reference.

The linkage should be substantially inert to the cyclic sequencingreactions and scan cycles. Usually, the linkage will be at a defined andhomogeneous polymer position, preferably at the end opposite where thesequencing chemistry takes place. Although the type of linkage isdependent upon the polymer being sequenced, various types of polymershave preferred linkages. For polypeptides, amino terminal or carboxylterminal linkages will be preferred. Specific amino terminal linkagesinclude amino butyric acid, amino caproic acids, and similar carboxylicacids. Specific carboxyl terminal linkages include butyric acid, caproicacid, and other carboxylic acids, hydrocarbon, and ethers. See nowabandoned U.S. Ser. No. 07/435,316, filed Nov. 13, 1989 (VLSIPS parent),and U.S. Ser. No. 07/492,462, filed Mar. 7, 1990, now U.S. Pat. No.5,143,854 (VLSIPS CiP), which are incorporated herein by reference. Fornucleic acids, the linkages will typically be either 5′ or 3′ linkages.Suitable 3′ linkages include those illustrated in FIG. 6, and othersdescribed in U.S. Ser. No. 07/624,114 (sequencing by hybridization).

Alternatively, for complementary polymers, particularly nucleic acids,linkage may be via crosslinkage of the complementary polymers where thecomplementary strand is directly attached to the matrix. Acridine dyes,e.g., psoralen, or a similar crosslinking agent between the strands canbe used. See, e.g., Dattagupta, et al., “Coupling of Nucleic Acids toSolid Support By Photochemical Methods,” U.S. Pat. No. 4,713,326; andU.S. Pat. No. 4,542,102; and Chatterjee, M. et al. (1990) J. Am. Chem.Soc. 112:6397; which describe useful crosslinking reagents, and arehereby incorporated herein by reference.

For polynucleotides, the preferred attachment to the matrix is through asynthetic oligomer by the 5′ end of each target sequence. This oligomeris designed to anneal to the desired target templates used in asynthetic system or to the polynucleotide used in the degradationapproach. In one embodiment, a vector sequence which is complementary tothe immobilized oligonucleotide is incorporated adjacent the cloninginserts, therapy providing a common complementary sequence for eachinsert. In particular, a cloning vector will be selected with a definedsequence adjacent the insert. See, e.g., Sambrook et al., (1989)Molecular Cloning: A Laboratory Manual, Vols. 1-3, Cold Spring HarborPress, which is hereby incorporated herein by reference. This definedsequence is used, in some embodiments, as a common linker for all of thevector inserts. The inserts, adjacent to this linker, will betransferable by hybridization to the matrix linked complementarysequences. The hybrids are crosslinked by addition of a suitablecrosslinker under appropriate conditions, for example, photocrosslinkingby psoralen with uv light. See, e.g., Song et al. (1979) Photochem.Photobiol. 29:1177-1197; Cimino et al. (1985) Ann. Rev. Biochem.54:1151-1193; and Parsons (1980) Photochem. Photobiol. 32:813-821; eachof which is incorporated herein by reference. Using these approaches,the oligonucleotide linker serves as both the attachment linker and thepolymerization primer.

FIG. 6 illustrates a preferred 3′ terminal linkage designed for aphosphoramidite linkage of a synthetic primer and the reactions formingthem. The chemical reactions for actually performing the linkage will besimilar to those used for oligonucleotide synthesis instruments usingphosphoramidite or similar chemistry. Applied Biosystems, Foster City,Calif. supplies oligonucleotide synthesizers.

C. Scanning System

The scanning system should be able to reproducibly scan the substrate.Where appropriate, e.g., for a two dimensional substrate where thepolymers are localized to positions thereon, the scanning system shouldpositionally define the clusters attached thereon to a reproduciblecoordinate system. It is important that the positional identification ofclusters be repeatable in successive scan steps. Functionally, thesystem should be able to define physical positions to a coordinatesystem as described above and illustrated in FIGS. 3 and 4.

In alternative embodiments, the system can operate on a cruder level byseparately detecting separate wells on a microtiter plate, or byscanning marbles which pass by the detector in an embodiment asdescribed above and illustrated in FIG. 4.

The scanning system would be similar to those used in electroopticalscanning devices. See, e.g., the fluorescent detection device describedin U.S. Ser. No. 07/492,462, now U.S. Pat. No. 5,143,854 (VLSIPS CIP),and U.S. Ser. No. 07/624,120 (automated VLSIPS). The system couldexhibit many of the features of photographic scanners, digitizers oreven compact disk reading devices. For example, a model no. PM500-A1 x-ytranslation table manufactured by Newport Corporation can be attached toa detector unit. The x-y translation table is connected to andcontrolled by an appropriately programmed digital computer such as anIBM PC/AT or AT compatible computer. The detection system can be a modelno. R943-02 photomultiplier tube manufactured by Hamamatsu, attached toa preamplifier, e.g., a model no. SR440 manufactured by StanfordResearch Systems, and to a photon counter, e.g., an SR430 manufacturedby Stanford Research System, or a multichannel detection device.Although a digital signal may usually be preferred, there may becircumstances where analog signals would be advantageous.

The stability and reproducibility of the positional localization inscanning will determine, to a large extent, the resolution forseparating closely positioned polymer clusters in a 2 dimensionalsubstrate embodiment. Since the successive monitoring at a givenposition depends upon the ability to map the results of a reaction cycleto its effect on a positionally mapped cluster of polymers, highresolution scanning is preferred. As the resolution increases, the upperlimit to the number of possible polymers which may be sequenced on asingle matrix will also increase. Crude scanning systems may resolveonly on the order of 1000 μm, refined scanning systems may resolve onthe order of 100 μm, more refined systems may resolve on the order ofabout 10 μm with optical magnification systems a resolution on the orderof 1.0 μm is available, and more preferably a resolution on the order ofbetter than 0.01 μm is desired. The limitations on the resolution may bediffraction limited and advantages may arise from using shorterwavelength radiation for the photo-optical deprotection fluorescentscanning steps. However, with increased resolution, the time required tofully scan a matrix will be increased and a compromise between speed andresolution will necessarily be selected. Parallel detection deviceswhich will provide high resolution with shorter scan times will beapplicable where multiple detectors will be moved in parallel.

With other embodiments, resolution often is not so important andsensitivity might be emphasized. However, the reliability of a signalmay be pre-selected by counting photons and continuing to count for alonger period at positions where intensity of signal is lower. Althoughthis will decrease scan speed, it can increase reliability of the signaldetermination. Various signal detection and processing algorithms may beincorporated into the detection system, such as described in U.S. Ser.No. 07/624,120 (activated VLSIPS). In one embodiment, the distributionof signal intensities of pixels across the region of signal areevaluated to determine whether the distribution of intensitiescorresponds to a time positive signal.

The detection system for the signal or label will depend upon the labelused, which may be defined by the chemistry available. For opticalsignals, a combination of an optical fiber or charged couple device(CCD) may be used in the detection step. In those circumstances wherethe matrix is itself transparent to the radiation used, it is possibleto have an incident light beam pass through the substrate with thedetector located opposite the substrate from the polymers. Forelectromagnetic labels, various forms of spectroscopy systems can beused. Various physical orientations for the detection system areavailable and discussion of important design parameters is provided,e.g., in Jovin, Adv. in Biochem. Bioplyms, which is hereby incorporatedherein by reference.

Various labels which are easily detected include radioactive labels,heavy metals, optically detectable labels, spectroscopic labels and thelike. Various photoluminescent labels include those described in U.S.Ser. No. 624,114 (sequencing by hybridization). Protection anddeprotection are described, e.g., in McCray, et al. (1989) Ann. Rev.Biophysical Chemistry 18:239-270, and U.S. Ser. No. 07/624,120(automated VLSIPS), each of which is hereby incorporated herein byreference.

With a processing system, the speed of scanning may be dramaticallyincreased with a system which only scans positions where known clustersof polymer are attached. This allows the scanning mechanism to skip overareas which have been determined to lack any polymer clusters and avoidsloss of time in scanning useless regions of the matrix. Moreover,various problems with spurious or overlapping signals may be adjustedfor by appropriate analysis.

A scanning apparatus which may be used for the presently described usesis schematically illustrated in FIG. 7. A substrate 52 is placed on anx-y translation table 54. In a preferred embodiment the x-y translationtable is a model no. PM500-A1 manufactured by Newport Corporation. Thex-y translation table is connected to and controlled by an appropriatelyprogrammed digital computer 56 which may be, for example, anappropriately programmed IBM PC/AT or AT compatible computer. Of course,other computer systems, special purpose hardware, or the like couldreadily be substituted for the AT computer used herein for illustration.Computer software for the translation and data collection functionsdescribed herein can be provided based on commercially availablesoftware including, for example, “Lab Windows” licensed by NationalInstruments, which is incorporated herein by reference for all purposes.

The substrate and x-y translation table are placed under a microscope 58which includes one or more objectives 60. Light (about 488 nm) from alaser 62, which in some embodiments is a model no. 2020-05 argon ionlaser manufactured by Spectraphysics, is directed at the substrate by adichroic mirror 64 which passes greater than about 520 nm wavelengthlight but reflects 488 nm light. Dichroic mirror 64 may be, for example,a model no. FT510 manufactured by Carl Zeiss. Light reflected from themirror then enters the microscope 58 which may be, for example, a modelno. Axioscop 20 manufactured by Carl Zeiss. Fluorescein-marked materialson the substrate will fluoresce >488 nm light, and the fluoresced lightwill be collected by the microscope and passed through the mirror. Thefluorescent light from the substrate is then directed through awavelength filter 66 and, thereafter through an aperture plate 68.Wavelength filter 66 may be, for example, a model no. OG530 manufacturedby Melles Griot and aperture plate 68 may be, for example, a model no.477352/477380 manufactured by Carl Zeiss.

The fluoresced light then enters a photomultiplier tube 70 which in oneembodiment is a model no. R943-02 manufactured by Hamamatsu, the signalis amplified in preamplifier 72 and photons are counted by photoncounter 74. The number of photons is recorded as a function of thelocation in the computer 56. Pre-Amp 72 may be, for example, a model no.SR440 manufactured by Stanford Research Systems and photon counter 74may be a model no. SR430 manufactured by Stanford Research Systems. Thesubstrate is then moved to a subsequent location and the process isrepeated. In preferred embodiments the data are acquired every 1 to 100μm with a data collection diameter of about 0.8 to 10 μm preferred. Inembodiments with sufficiently high fluorescence, a CCD detector withbroadfield illumination is utilized.

By counting the number of photons generated in a given area in responseto the laser, it is possible to determine where fluorescent markedmolecules are located on the substrate. Consequently, for a substratewhich has a matrix of polypeptides, for example, synthesized on thesurface thereof, it is possible to determine which of the polypeptideshas incorporated a fluorescently marked monomer.

According to preferred embodiments, the intensity and duration of thelight applied to the substrate is controlled by varying the laser powerand scan stage rate for improved signal-to-noise ratio by maximizingfluorescence emission and minimizing background noise. Signal analysismay improve the resolution and reliability of the system. The time ofphoton counting may be varied at various positions to provide highsignal to background or noise.

D. Synthetic or Degradative Cycle

The present invention provides a substrate with positionally separatedpolymers for sequencing. The separation may be by solid phase carriersseparated in separate wells, by separately manipulable carriers such asbeads or marbles, or by physical separation of regions on atwo-dimensional substrate surface. Each cluster region is a target forthe sequencing reactions. Although the reactions are, in variousembodiments, performed on all the clusters together, each cluster can beindividually analyzed by following the results from the sequence ofreactions on polymer clusters at positionally defined locations.

The synthetic mode, as illustrated in FIG. 1 is easily applied to thesequencing of nucleic acids, since one target strand may serve as thetemplate to synthesize the complementary strand. The nucleic acid can beDNA, RNA or mixed polymers. For the purposes of illustration, and not bylimitation, the sequencing steps for DNA are described in detail. Thesynthetic mode, an example of which is depicted in FIG. 8 fornucleotides, may also be useful in circumstances where synthesis occursin response to a known polymer sequence. The synthetic scheme depends,in part, on the stepwise elongation by small and identifiable units. Apolymerase is used to extend a primer complementary to a targettemplate. The primer is elongated one nucleotide at a time by use of aparticular modified nucleotide analog to which a blocking agent is addedand which prevents further elongation. This blocking agent is analogousto the dideoxy nucleotides used in the Sanger and Coulson sequencingprocedure, but in certain embodiments here, the blockage is reversible.This analog is also labeled with a removable moiety, e.g., a fluorescentlabel, so that the scanning system can detect the particular nucleotideincorporated after its addition to the polymerization primer.

Panel 4A illustrates the cycle of sequence reactions in one embodiment.The template polymer 82 located at a particular site has already beenlinked to substrate. The template 82 and complementary primer 84 arehybridized. Often, the primer 84 is common to all of the target templatesequences, selected by its common occurrence on a selected cloningvector. The primer 84 is also often covalently crosslinked to the targettemplate 82 using psoralen and U.V. light.

Labeled and blocked monomers 86 are shown, the label depicted by theasterisk and the polymerization blocking groups indicated by B. Acompatible polymerase 88 which can elongate the primer with the labeledblocked monomers 86 is used in reaction 1. In the preferred embodiment,the separate labeled monomers can be distinguished from one another bythe wavelength of fluorescent emission.

In the example illustrated, a labeled blocked guanosine monomer has beenincorporated into the elongated primer 90.

Step 2 is a scan, where the signal at the position corresponding totemplate 82 indicates that the guanosine analog was incorporated.Reaction 2 is performed, which removes both the label and blockinggroup. It will be recognized that the blocking group prevents elongationby any more than a single nucleotide in each reaction cycle. Reaction 3is equivalent to reaction 1, though the substrate primer has beenelongated by one monomer.

Panel B illustrates the scheme in a logic flow chart. The template 82 isattached to the substrate, either directly or through the primer.Reaction 1 elongates the primer by a single labeled blocked nucleotide.A scan step is performed and the blocking and labeling agents areremoved. The elongation reaction is performed and the cycle repeated.

For a nucleic acid, a unit for addition would typically be a singlenucleotide. Under certain circumstances, dimers or trimers or largersegments may be utilized, but a larger number of different possiblenucleotide elements requires high distinguishability in other steps. Forexample, there are only four different nucleotide monomer possibilities,but there are sixteen different dimer possibilities. The distinctionamong four possibilities is more precise and simple than among sixteendimer possibilities. To prevent elongation by a unit length greater thanone monomer, the nucleotide should be blocked at the position of 3′elongation. Usually, the nucleotide will be blocked at the 3′ hydroxylgroup where successive nucleotides would be attached. In contrast to adideoxy nucleotide, typically the blocking agent will be a reversibleblocking agent thereby allowing for deblocking and subsequentelongation.

Variations may be easily incorporated into the procedure. If the labelson the monomers are not distinguishable, successive substrate scans canbe performed after each monomer is provided with conditions allowing itsincorporation. Alternatively, a small fraction of permanently blockedbut reversibly labeled monomers may be incorporated. Those specificmolecules which incorporate the blocked monomers are permanently removedfrom further polymerization, but such is acceptable if the labelingmoiety is also removed.

1. Other Monomers

One important functional property of the monomers is that the label beremovable. The removal reaction will preferably be achieved using mildconditions. Blocking groups sensitive to mild acidic conditions, mildbasic conditions, or light are preferred. The label position may beanywhere on the molecule compatible with appropriate polymerization,i.e., complementary to the template, by the selected polymerase. Asingle polymerase for all of the modified nucleotide is preferred, but adifferent polymerase for each of the different monomers can be used.

Nucleotide analogs used as chain-terminating reagents will typicallyhave both a labeling moiety and a blocking agent while remainingcompatible with the elongation enzymology. As the blocking agent willusually be on the 3′ hydroxyl position of the sugar on a nucleotide, itwould be most convenient to incorporate the label and the blocking agentat the same site, providing for a simple reaction for simultaneousremoval of the label and blocking agent. However, it is also possible toput a label on another portion of the nucleotide analog than the 3′hydroxyl position of the sugar, thereby requiring a two-step reactioncycle for removing the blocking and labeling groups.

Analogs will be found by selecting for suitable combinations ofappropriate nucleotides with compatible polymerases. In particular, itis desired that a selected polymerase be capable of incorporating anucleotide, with selectivity, having both the blocking moiety and thelabel moiety attached. It has been observed that RNA polymerases areless fastidious with respect to the nucleotide analogues which will bepolymerized into a growing chain. See, e.g., Rozovaskaya, T., et al.(1977) Molekulyarnaya Biologiya, 11:598-610; Kutateladze, T., et al.(1986) Molekulyarnya Biologiya, 20:267-276; and Chidgeavadze, Z., et al.(1985) FEBS Letters, 183:275-278. Moreover, those references alsoindicate that rather significant chemical moieties may be attached atthe 2′ or 3′ positions on a nucleotide, and still be correctlyincorporated at the growing chain terminus.

In particular, it is not necessary that the same nucleotide have boththe reversible blocking moiety and the removable labeling moiety, as acombination of two separate nucleotide analogues could be utilized,e.g., N1, which is reversibly blocked and not labeled, and N2, which isirreversibly blocked but removably labeled. Note that the removal oflabel may be affected by destruction of the label, e.g. fluorescence orpreferably by removal Both of these nucleotides might be, for instance,A analogues. With the mixture, at an appropriate sequence position of atarget sequence, N1 and N2 nucleotides can be incorporated at anappropriate ratio, and these can be polymerized by either two separatepolymerases, or preferably a single polymerase.

For example, two separate polymerases might be necessary, P1 whichincorporates N1, and P2 which incorporates N2. At the given location inthe sequence, some of the growing polymers will incorporate N1 with P1polymerase, and others will incorporate N2 with the P2 polymerase. Theproportions of N1, N2, P1, and P2 may be titrated to get the desiredfractional proportions of the N1 reversibly blocked nucleotides and theN2 labeled but irreversibly blocked nucleotides.

As all of the growing chains have blocked nucleotides, no elongationtakes place beyond a single nucleotide. The N2 nucleotides provide aspecific label, detected in the scanning step. After determination ofthe incorporated label, the label may be removed or destroyed, and thoseirreversibly terminated growing chains become permanently removed fromfurther participation in the sequencing process. Photodestruction may beachieved by a high intensity laser beam of the correct wavelength. See,e.g., March (1977) Advanced Organic Chemistry Reactions, Mechanisms andStructure (2d Ed) McGraw; and Carey and Sundberg (1980) Advanced OrganicChemistry: part A Structure and Mechanisms, Plenum.

Next, the reversible blocking moiety is removed, providing a new set ofslightly longer polymers ready for the next step. Of course, the amountof label necessary to be incorporated must be detectable, preferablywith a clear, unambiguous positive signal. The amount of labelincorporated will depend, in part, upon the conditions in thepolymerizing step and the relative incorporation of the N1 and N2nucleotides. The proportions of the nucleotides, polymerases, and otherreagents may be adjusted to appropriately incorporate the desiredproportions of the nucleotides.

In an embodiment where a single polymerase will incorporate both N1 andN2, the relative proportions and conditions to get the correctincorporation levels of the two nucleotides can be titrated. In analternative preferred embodiment, a single nucleotide will have both theremovable label and the reversible blocking moiety.

A similar approach may be necessary where only some fraction of thenucleotide analogues is labeled. Separate polymerases might also beuseful for such situations, and each polymerase may have specialconditions necessary for activity.

Procedures for selecting suitable nucleotide and polymerase combinationswill be readily adapted from Ruth et al. (1981) Molecular Pharmacology20:415-422; Kutateladze, T., et al. (1984) Nuc. Acids Res.,12:1671-1686; Kutateladze, T., et al. (1986) Molekulyarnaya Biologiya20:267-276; Chidgeavadze, Z., et al. (1985) FEBS Letters, 183:275-278;and Rozovskaya, T., et al. (1977) Molekulyarnaya Biologiya 11:598-610.

The determination of termination activity is done in two steps. First,nucleotide analogues are screened for the ability of the compound toinhibit polymerase activity. Then the nucleotide analogue is tested forbase-specific termination as manifested by generating a correct DNAsequencing ladder on a template of known sequence. The appropriatereaction conditions are those used for conventional sequencing reactionswith the respective polymerases. The conditions are then modified in theusual ways to obtain the optimal conditions for the particularterminator compound (e.g. concentration of terminator, ratio ofterminator to dNTP, Mg⁺⁺, and other reagents critical to properpolymerase function.

By way of example, an approach employing the polymerase known as reversetranscriptase (AMV) will be described. The initial conditions areessentially as described by Prober, et al. (1987) Science 238: 336-341.

A nucleotide analogue is first selected from the group available from acommercial source such as Amersham, New England Nuclear, or SigmaChemical Company. In particular, nucleotides which are reversiblyblocked from further elongation, especially at the 5′ or 3′-OH will beused.

General properties which are desired have been described. Each of theseanalogs can be tested for compatibility with a particular polymerase bytesting whether such polymerase is capable of incorporating the labeledanalog. Various polymerases may be screened, either natural forms of thementioned types, or variants thereof. Polymerases useful in connectionwith the invention include E. Coli DNA polymerase (Klenow fragment); andKlenow and Henningsen (1970) Proc. Nat'l Acad. Sci. USA 65:168-175; andJacobsen et al. (1974) Eur. J. Biochem. 45:623-627; modified and clonedversions of T7 DNA polymerase (Sequenase™ and Sequenase 2.0™); see Taborand Richardson (1987) Proc. Nat'l Acad. Sci. USA 84:4767-4771; and Taborand Richardson (1987) J. Biol Chem. 262:15330-15333; Taq DNA polymerasefrom thermostable Thermus aquaticus; see Chien et. al. (1976) J.Bacteriol. 127:1550-1557; and its cloned version Amplitaq; Saiki andGelfand (1989) Amplifications 1:46; T4 DNA polymerase; see Nossal (1974)J. Biol. Chem. 249:5668-5676, and various reverse transcriptases, bothRNA- and DNA-dependent DNA polymerases, e.g., avian retroviruses; seeHouts (1970) J. Virology 29:517-522; and murine retroviruses; seeKotewicz et al. (1985) Gene 85:249-258; Gerard et al. (1986) DNA5:271-279; and Bst polymerase; see Ye, S. and Hong (1987) ScientiaSinica 30:503-506.

In order to ensure that only a single nucleotide is added at a time, ablocking agent is usually incorporated onto the 3′ hydroxyl group of thenucleotide. Optimally, the blocking agent should be removable under mildconditions (e.g., photosensitive, weak acid labile, or weak base labilegroups), thereby allowing for further elongation of the primer strandwith a next synthetic cycle. If the blocking agent also contains thefluorescent label, the dual blocking and labeling functions will beachieved without the need for separate reactions for the separatemoieties.

The blocking group should have the functional properties of blockingfurther elongation of the polymer. Additional desired properties arereversibility and inertness to the sequencing reactions. Preferably,where an enzymatic elongation step is used, the monomers should becompatible with the selected polymerase. Specific examples for blockinggroups for the nucleic acids include acid or base labile groups at the3′OH position. See, e.g., Gait (1984) Oligonucleotide Synthesis: APractical Approach, IRL Press, Oxford.

A DNA-dependent DNA polymerase is the polymerase of choice. Polymerasesused for conventional DNA sequencing, for example, Klenow fragment of E.coli DNA Pol, Sequenase (modified T7 DNA polymerase), Taq (Thermusaquaticus) DNA polymerase, Bst (Bacillus stearothermophilus), DNApolymerase, reverse transcriptase (from AMV, MMLV, RSV, etc.) or otherDNA polymerases will be the polymerases of choice. However, there is afunctional constraint that the polymerase be compatible with the monomeranalogues selected. Screening will be performed to determine appropriatepolymerase and monomer analog combinations.

Removal of the blocking groups may also be unnecessary if the labels areremovable. In this approach, the chains incorporating the blockedmonomers are permanently terminated and will no longer participate inthe elongation processes. So long as these blocked monomers are alsoremoved from the labeling process, a small percentage of permanent lossin each cycle can also be tolerated.

The fluorescent label may be selected from any of a number of differentmoieties. The preferred moiety will be a fluorescent group for whichdetection is quite sensitive. Various different fluorescence-labelingtechniques are described, for example, in Kambara et al. (1988)“Optimization of Parameters in a DNA Sequenator Using FluorescenceDetection,” Bio/Technol. 6:816-821; Smith et al. (1985) Nucl. Acids Res.13:2399 2412; and Smith et al. (1986) Nature 321:674 679, each of whichis hereby incorporated herein by reference. Fluorescent labelsexhibiting particularly high coefficients of destruction may also beuseful in destroying nonspecific background signals.

Appropriate blocking agents include, among others, light sensitivegroups such as 6-nitoveratryl-oxycarbonyl (NVOC),2-nitobenzyloxycarbonyl (NBOC), α,α-dimethyl-dimethoxybenzyloxycarbonyl(DDZ), 5-bromo-7-nitroindolinyl, o-hydroxy-2-methyl cinnamoyl,2-oxymethylene anthraqinone, and t-butyl oxycarbonyl (TBOC). Otherblocking reagents are discussed, e.g., in Ser. No. 07/492,462; Patchomik(1970) J. Amer. Chem. Soc. 92:6333; and Amit et al. (1974) J. Org. Chem.39:192, all of which are hereby incorporated herein by reference.Additional blocking agents attached to particular positions may beselected according to the functional directives provided herein.

FIG. 9 schematically illustrates the synthesis of a generic protectednucleotide. A suitable nucleotide is labeled with the FMOC fluorescentlydetectable label by reaction under the conditions described, e.g., inU.S. Ser. No. 624,114 (sequencing by hybridization), FMOC-Cl, and H₂O. Aprotection moiety will be added using conditions also described there.

Various nucleotides possessing features useful in the described methodcan be readily synthesized. Labeling moieties are attached atappropriate sites on the nucleotide using chemistry and conditions asdescribed, e.g., in Gait (1984) Oligonucleotide Synthesis. Blockinggroups will also be added using conditions as described, e.g., in U.S.Ser. No. 07/624,114 (sequencing by hybridization). FIG. 9 also outlinesvarious reactions which lead to useful nucleotides.

Additionally, the selected polymerases used in elongation reactionsshould be compatible with nucleotide analogs intended for polymerizationto the primer. Simple screening procedures for nucleotide and polymerasecombinations may be devised to verify that a particular combination isfunctional. A test using primer with template which directs the additionof the nucleotide analog to be incorporated will determine whether thecombination is workable. Natural polymerases or variants thereof may beused under particular defined conditions.

The degradative scheme is generally illustrated in FIG. 1, an examplemore generally applicable to biological macromolecular polymers isdepicted in FIG. 10. This method is useful for a wider variety ofpolymers without the limitations imposed by the need to replicate thepolymer. The degradative sequencing technique depends, in part, upon theability to specifically label or distinguish between various differentterminal monomers at particular matrix positions. Reactions for specificremoval of a defined monomer unit are important.

This monomer distinguishability can arise from an ability todifferentiate between label on the various possible monomers in thepolymer. As a second means, distinguishability can come from specificreagents which react with particularity on different monomers. Thus, forinstance, labels may be used which generally attach to the terminalnucleotide, but whose fluorescent signal differs depending upon thenucleotide. As a third means, a reagent which specifically affects thelabel on only one monomer may be used, as described below.

In the first example, every polymer cluster will be labeled at aparticular end, e.g., the 5′ end, without specificity for the monomerlocated there. The scan step will be able to distinguish the terminalmonomers, after which each labeled terminal monomer is specificallyremoved. The general label step is repeated in the cycle as described.

In the second means for distinguishability, reagents are used whichproduce a signal which is dependent upon the terminal nucleotide. Forexample, a labeling molecule which binds only to one specific terminalmonomer will provide a monomer specific label. This will provide a cyclemuch like the first means for distinguishability where the properties ofthe label is different depending upon the terminal nucleotide to whicheach specific labeling reagent binds.

In the third means for distinguishability, an individual reagent labelsor affects only a specific terminal monomer. Polymers susceptible toeach reagent by virtue of terminating with the corresponding monomerwill have their labels specifically affected. A scan of the matrix aftereach step and comparison with the earlier scans will determine whichpositions correspond to polymers ending with a susceptible monomer.Performing a removal step with a second monomer-specific reagentfollowed by a scan will identify those positional locations havingpolymer clusters ending with that second monomer. A similar reagent forthe other possible monomers will further define all of thepossibilities. Finally, when all of the possible monomers have beenremoved, the labeling reaction may be repeated and the succession ofspecific reagent and scanning steps will also be repeated. Thisprocedure allows for a succession of automated steps to determine thesequence of the polymer clusters localized to distinct positions.

Finally, a combination of both specificity of reagent and ability todistinguish label on different monomers can be utilized. Neither aloneneed be relied upon exclusively. Thus, in the case of nucleotides, anability to distinguish into two separate classes of nucleotides, e.g., Aand C from G and T, combined with specific reagents for distinguishingbetween the indistinguishable label pairs, e.g., in the exampleprovided, A from C, or G from T, can also provide sufficient informationfor sequencing.

Instead of performing four specific reactions on the same substratematrix, each of the four individual reactions can be performed onseparate parallel matrices. Four separate substrate matrices may be madeby a replica plating or successive transfers, each matrix having thesame spatial distribution of polymer clusters. Thus, each separatesubstrate can be subjected to only a single specific reagent in a highlyoptimized reaction. On each cycle, one out of the four parallelsubstrates should show a signal indicating the monomer at the terminalfor the cluster at a given matrix position.

Likewise, two parallel substrates can be provided, and each of theparallel substrates is used to determine two of the four possiblenucleotides at each position. Instead of treating a single matrix withfour separate reactions, this approach allows treating each of twosubstrates with only two separate reactions. By minimizing the number ofreactions to which each chip is exposed, the side reactions will beminimized, the chemistry will be optimized, and the number of cyclesthrough which a matrix will survive will be optimized. This provides anadvantage in the number of cycles to which a matrix can be subjectedbefore the signal to noise becomes indistinguishable.

E. Label

The label is important in providing a detectable signal. The signal maybe distinguishable among the various monomers by the nature of thesignal, e.g., wavelength or other characteristic, as described in Proberet al. (1987) Science 238:336-341. A monomer-specific reagent can allowdetermination of whether each position has a particular terminal monomerby the presence or loss of label.

The label on the monomer may be attached by a noncovalent attachment,but will be preferably attached by a direct covalent attachment. Thelabel will typically be one which is capable of high positionalresolution and does not interfere with the nucleotide-specific chemistryor enzymology. Although many different labels may be devised includingenzyme linked immunosorbent assays (ELISA), spectrophotometric labels,light producing or other labels, a fluorescent moiety is the preferredform. For example, an avidin/biotin type affinity binding may be usefulfor attaching a particular label. Alternatively, an antibody may be usedwhich is specific for binding to a particular terminal monomer. A widevariety of other specific reagents can be used to provide a labelingfunction. See, for example, U.S. Ser. No. 07/624,114 (sequencing byhybridization), which is hereby incorporated herein by reference.

The means of detection utilized will be selected in combination withvarious other considerations. In some circumstances, a spectroscopiclabel may be most compatible with a particular monomer. Enzyme linkedassays with a spectrophotometric detection system are a workable system.Phosphorescent or light producing assays provide high sensitivity usingcharged couple devices. Fluorescent systems provide the same advantages,especially where the incident light beam is a laser. The fluorescentlabel also may provide the added advantage of fluorescing at differentwavelengths for the different monomers, providing a convenient means todistinguish between different monomers. Other forms of label may bedesired for various reasons, for example, magnetic labels, radioactivelabels, heavy metal atoms, optically detectable labels,spectroscopically detectable labels, fluorescent labels, and magneticlabels.

For sequencing nucleic acids by this method, the labeled monomers aresimpler than those monomers used for the synthetic method. The blockinggroup is unnecessary, but terminal specific reagents are more difficultto produce.

The preferred attachment sites will be at the same location as theblocking site, so a combined label and blocking moiety is morepreferred. The label will be attached as described, e.g., in U.S. Ser.No. 07/624,114 (sequencing by hybridization).

Two types of degradation cycles can be used, either non-specific removalof the terminal labeled nucleotide, or a base-specific removal. With thenon-specific removal means, each of the end monomers, when labeled,should be distinguishable from the other three monomer possibilities.This allows for determination of the terminal nucleotide for the clusterlocalized at a given matrix position. Then the terminal, labelednucleotides are non-specifically removed and the newly exposed terminalnucleotides will be again distinguishably labeled.

By this scheme, a specific label for each of the different nucleotidesmay be provided. For example, fluorescent reagents specific for each ofthe nucleotides may provide a signal with a different wavelength. Thiswill more usually occur when the fluorescent probe is located near thebase moiety of the nucleotide. In the scanning step, the regionsterminating with each of the four different nucleotides may bedetermined. Then, a reaction is performed removing the labeled terminalnucleotides from all of the polymers. This removal may be eitherenzymatic, using a phosphatase, an exonuclease or other similar enzyme,or chemical, using acid, base, or some other, preferably mild, reagent.Again, the reactions are performed which label each of the terminalnucleotides and a scan step repeated in the same manner.

In the base-specific removal scheme, nucleotide-specific removal can beperformed. For example, an enzyme which will function to remove only asingle modified nucleotide, e.g., a 5′-fluorescein-dAMP-specificexonuclease, is constructed. This may be achieved by proper constructionof a catalytic antibody. Other similar reagents may be generated foreach of the other labeled nucleotide monomers.

Catalytic or derivatized antibodies to catalyze the removal of the3′-end or 5′-most fluorescent base in a base-specific manner may beconstructed as follows. A recombinant antibody library or a series ofmonoclonal antibodies is screened with fluorescent donor-quenchersubstrates. These substrates consist of a fluorescent labeled base (A,C, G, or T) on the 5′ or 3′ end joined by a 5′ to 3′ phosphodiesterlinkage to a second base. A collection of all four possible second basesfor each of the four end bases gives the best selection target for therequired non-specificity with respect to the second base. The secondbase is then tethered to an acceptor group in sufficient proximity toquench the fluorescence of the end group. In the presence of a catalyticantibody with cleaving activity, a fluorescent signal occurs from theseparation of the quenching group from the terminal fluorescent label.To assure both base and end specificity, the positive monoclonalantibody clones are rescreened against the other substrates.

Upon selection of an antibody exhibiting the desired specificity (orlack thereof), the reactive group for cleavage may be attached. Thiscleavage reagent may be chemical or enzymatic and will be attached by anappropriate length linker to the antibody binding site in an orientationwhich is consistent with the steric requirements of both binding andspecific cleavage.

Particularly useful specific reagents may be produced by makingantibodies specific for each of the four different modified terminalnucleotide bases. These antibodies would then specifically bind only topolymers terminating in the appropriate base analog. By combining acleavage reagent to the specific antibody, a terminal nucleotidespecific cleavage reagent is generated.

In one example of the degradative embodiment, all of the polymers may beuniformly labeled at a particular end. Thereafter, a specific removalreaction which removes only a particular nucleotide may be performed,leaving the three other nucleotides labeled. Thereafter, a scanning stepis performed through which all regions which had incorporated thatparticular nucleotide will have lost the label through specific removal.Then, the second specific reagent will be applied which specificallyremoves the second labeled nucleotide, and the scanning step followingthat reaction will allow determination of all regions which lose thesecond particular nucleotide. This process is repeated with reagentsspecific for each of the last two remaining labeled nucleotidesinterspersed with scanning steps, thereby providing information onregions with each of the nucleotides located there. Then, the entireprocess may be repeated by labeling the next terminal nucleotidesuniformly. As mentioned below, replication techniques may allow formaking four separate but identical matrix substrates. Each substrate maybe subjected to single nucleotide-specific reactions, and the scanresults correlated with each of the other parallel substrates.

In the degradation scheme, the polynucleotide linkage to the matrix mustbe more carefully selected such that the free end of the oligonucleotidesegments used for attachment will not interfere with the determinationsof the target sequence terminus.

F. Utility

The present sequencing method is useful to monitor and check theaccuracy and reliability of the synthetic processes described in U.S.Ser. No. 07/362,901 (VLSIPS parent) and U.S. Ser. No. 07/492,462, nowU.S. Pat. No. 5,143,854 (VLSIPS CIP). The present method can be used tocheck the final products synthesized therein, or to label each monomeras they are added stepwise to monitor the efficiency and accuracy ofthose synthetic methods.

The present invention can also be used to monitor or sequence matrixbound clusters of positionally distinct polymers. This sequencingprocess provides the capability of simultaneously sequencing a largeplurality of distinct polymers which are positionally segregated.

The method will be used to sequence extremely large stretches ofpolymer, e.g., nucleic acids. A large number of shorter segments of alarge sequence can be sequenced with alignment of overlaps eitherrandomly generated, or in an ordered fashion, or particular sequenceablesegments of a large segment can be generated. In one approach, a largesegment is subcloned into smaller segments and a sufficient number ofthe randomly generated subclones are sequenced as described herein toprovide sequence overlap and ordering of fragments.

In an alternative approach, a large segment can be successively digestedto generate a succession of smaller sized subclones with ends separatedby defined numbers of monomers. The subclones can be size sorted by astandard separation procedure and the individual samples from aseparation device manually or automatically linked to a matrix in adefined positional map. Fractions resulting from size separation can bespatially attached at defined positions, often at adjacent positions.Then polymer sequences at adjacent positions on the matrix will also beknown to have ends which differ by, e.g., approximately 25 or 50 or moremonomers, thereby providing significantly greater confidence inoverlapping sequence data.

II. SPECIFIC EMBODIMENTS

A specific series of reactions for sequencing a matrix ofpolynucleotides is described.

A. Synthetic Method

This method involves annealing a primer (common to all the attachedsequences by virtue of the cloning construction) near to the 3′ end ofthe unknown target sequences. DNA polymerase, or a similar polymerase,is used to extend the chains by one base by incubation in the presenceof dNTP analogs which function as both chain terminators and fluorescentlabels. This is done in a one-step process where each of the four dNTPanalogs is identified by a distinct dye, such as described in Prober etal., Science 238:336-341, or in four steps, each time adding one of thefour bases, interspersed with a scanning identification step. When eachcluster incorporates the proper one of the four bases and thefluorescence scanning is complete, the matrix is stripped of the labeland the chain terminators are deblocked for a next round of baseaddition. Because the base addition is directed by the template strand,the complementary sequence of the fragments at each address of thematrix is deduced.

1. Attachment to a Surface

Both degradative and synthetic sequencing methods begin by obtaining andimmobilizing the target fragments of unknown sequence to be determinedat specific locations on the surface.

There are several strategies for photo-directed attachment of the DNAstrands to the surface in an orientation appropriate for sequencing. Acaged biotin technique, see, e.g., U.S. Ser. No. 07/435,316 (cagedbiotin parent), and U.S. Ser. No. 07/612,671 (caged biotin CIP), isavailable. Another technique that is especially applicable for theenzymatic synthesis method is to chemically attach a synthetic oligomerby the 5′ end to the entire surface (see FIG. 6), to activate it forphotocrosslinking (with psoralen, for example) and to anneal thecomplementary strands and photocrosslink the target strand of unknownsequence (complementary to this oligonucleotide at the 3′ end) at thespecific location addressed by light. In this case, the oligonucleotideserves as both the attachment linker and as the synthetic primer. Athird method is to physically transfer individual nucleic acid samplesto selected positions on the matrix, either manually or automatically.

Many sequences in each step are attached by cloning the library into aseries of vectors identical except for the sequences flanking theinsert. These primers can be added at the point of amplification of thecloned DNA with chimeric primers.

Alternatively, sequences are attached to a matrix substrate by colony orphage immobilization. This directly transfers the positionaldistribution on a petri plate to a usable substrate. Coloniesrepresenting a shotgun collection of sequences (enough to assure nearlycomplete coverage by overlap) are spread over (or in) a nutrient surfaceat a density to give about 100 or more colonies or plaques in severalsquare centimeters, and the colonies are allowed to grow to about 0.1 mmin diameter (the maximum possible density of clusters at this size is˜10,000 colonies/cm²). As described above, replica platings orsuccessive transfers may allow for preparation of multiple matrices withidentical positional distributions of polymers. Each separate matrix maythen be dedicated to the reactions applicable to a single monomer.

For example, in the use of a phage library, on a petri dish, thetransfer substrate surface is treated to release DNA from the phage.This is done, e.g., with CHCl₃ vapor, SDS-NaOH, or by heating. Prior torelease of DNA, the phage particles are often adsorbed to the surface byway of an antibody to the coat protein that has been immobilized on thesurface. This strategy prevents diffusion of the phage from thecolonies. The matrix surface is prepared by coating with anoligonucleotide, immobilized to the surface by one end that has homologywith the phage vector DNA adjacent to the cloning site.

The matrix surface is juxtaposed to the growth surface, and the phageDNA is allowed to anneal to the immobilized oligonucleotide. The growthsurface is removed, and the hybrid is stabilized by psoralen or anequivalent crosslinking reagent.

This method provides an efficient one-step method of placing many DNAfragments onto the detection surface in preparation for sequencing.Although the colonies are not placed in predefined locations, the randomarrangement of the clusters allows the final sequence to be assembledfrom correlation of overlap sequence data derived from sequence dataderived from each of the defined positions of each target cluster.

Sequences are, in other embodiments, attached by a manual or automatedtransfer technique. A few cells from each colony in a library aretoothpicked into microliter wells. The plate is heated to ˜100° C. for ashort period to lyse the cells and release the DNA. The plate is cooledand reagents for cycled amplification of the DNA using, e.g., PCRtechnology, are added, including primers common to all the clonedsequences. See, e.g., Innis et al. (1990) PCR Protocols: A Guide toMethods and Applications, Academic Press, which is hereby incorporatedherein by reference. The DNA is amplified asymmetrically by unbalancedprimer concentration to yield an excess of one strand for sequencing andattached to a substrate by manual or automated means.

An alternative form of automated localization is described above inpositioning of a succession of smaller sized polymers which are manuallyor automatically linked to the substrate in a pattern reflectingsequence overlaps.

2. Enzymatic Polymerization Method

The nucleic acid template is, in some embodiments, attached to thesurface by either the 5′ or the 3′ end, usually by a method as describedabove. A preferred method of attachment is to anneal the template to anoligonucleotide attached to the surface and to crosslink the template tothe oligonucleotide. Oligonucleotide primers are usually synthesizedchemically. In this case, the immobilized oligonucleotide may also serveas a primer for polymerization. Because polymerization proceeds 5′ to 3′on the primer, the template will be attached by its 3′ end, or a site 3′proximal to the region to be sequenced, for the purposes of thedescription to follow.

Step 1: A DNA-dependent, DNA polymerase such as those used forconventional DNA sequencing, for example, Klenow fragment of E. coli DNAPol, Sequenase™ (modified T7 DNA polymerase), Taq (Thermus aquaticus)DNA polymerase, Bst (Bacillus stearothermophilus), DNA polymerase,reverse transcriptase (from AMV, MMLV, RSV, etc.) or other DNApolymerases, and the reaction components appropriate to the particularDNA polymerase selected, are placed in the incubation chamber in directcontact with the surface.

Step 2: Fluorescent chain terminators (analogs of dATP, dCTP, dGTP, andTP, each labeled with fluorophore preferably emitting at adistinguishable wavelength) are added to the reaction at a sufficientconcentration and under suitable reaction conditions (time, temperature,pH, ionic species, etc., see Sambrook et al. (1989) Molecular Cloning,vols. 1-3, and Prober et al.) to cause essentially all of the chains onthe surface to be extended by one base and thereby terminated. Detectionof the specific label thereby incorporated into each chain identifiesthe last base added at each positional address in the matrix.

Step 3: The chain termination should be reversible by some means, suchas treatment with light, heat, pH, certain other chemical or biological(enzymatic) reagents, or some combination of these. Typically the chaintermination results from a blocking moiety which is labile to mildtreatment. By one of these means, the blocked 3′OH of the terminatingbase must be made available for chain extension in the next round ofpolymerization.

Step 4: There are several suitable labeled, terminator structures asfollows:

(a) The fluorophore itself functions as the chain terminator byplacement on the 3′ hydroxyl through a linkage that is easily andefficiently cleaved (removing the label and leaving the free 3′OH) bylight, heat, pH shift, etc. The surface is scanned with a scanningsystem, e.g., the fluorescence detection system described in U.S. Ser.No. 07/492,462, now U.S. Pat. No. 5,143,854 (VLSIPS CIP); and U.S. Ser.No. 07/624,120 (automated VLSIPS). Then, preferably in a single step,the fluorophore is removed and the chain is activated for the next roundof base addition.

(b) The fluorophore is placed in a position other than the 3′OH of thenucleoside, and a different group is placed on the 3′OH of the dNTPs tofunction as a chain terminator. The fluorophore and the 3′ blockinggroup are removed by the same treatment in a single step (preferably),or they may be removed in separate steps.

(c) An alternative polymer stepwise synthetic strategy can be employed.In this embodiment, the fluorophores need not be removable and may beattached to irreversible chain terminators. Examples of such compoundsfor use in sequencing DNA include, but are not limited to,dideoxynucleotide triphosphate analogs as described by Prober et al.(1987) Science 238:336-341. A second, unlabeled and reversible, set ofterminators is also required. Examples of these compounds aredeoxynucleotide triphosphates with small blocking groups such as acetyl,tBOC, NBOC and NVOC on the 3′OH. These groups are easily and efficientlyremoved under conditions of high or low pH, exposure to light or heat,etc. After each round of base addition and detection, the fluorophoresare deactivated by exposure to light under suitable conditions (thesechains have their labeling moiety destroyed and remain terminated,taking part in no further reactions). The unlabeled, reversibleterminators are unblocked at the 3′OH by the appropriate treatment toallow chain extension in subsequent rounds of elongation. The proportionof chains labeled in each round can be controlled by the concentrationratio of fluorescent to non-fluorescent terminators, and the reactioncan be driven to completion with high concentrations of the unlabeledterminators.

(d) A single dye strategy is used where all the base analog terminatorscarry the same fluorophore and each is added one at a time: A, C, G, T.The addition of each base is followed by scanning detection. After allfour fluorophores are added, reversal of the termination is performed,allowing for the addition of the next base analog. Then, each scanningstep determines whether the immediately preceding labeled nucleotide hadbeen incorporated at each distinct position.

The structures of the fluorescently labeled and reversible terminatorbase analogs are selected to be compatible with efficient incorporationinto the growing chains by the particular DNA polymerase(s) chosen tocatalyze extension. For example, where two different chain terminatorsare used, they may be utilized by two different polymerases that areboth present during the chain extension step.

Step 5: An optional step is the permanent capping of chain extensionfailures with high concentrations of dideoxynucleotide triphosphates.This step serves to reduce the background of fluorescence caused byaddition of an incorrect base because of inefficient chain extension(termination) at an earlier step.

Step 6: After scanning to determine fluorescence, the fluorophore isremoved or deactivated. Deactivation of the fluorophore can be achievedby a photodestruction event. The chain elongation block is reversed(usually by removing a blocking group to expose the 3′OH) by suitablemethods that depend on the particular base analogs chosen; and thesubstrate is washed in preparation for the next round of polymerization.

Step 7: Repeat the cycle.

B. Chain Degradation Method

This method involves labeling the last base of the chain (distal to thesurface attachment) with a fluorescent tag followed by base-specificremoval. All the polynucleotide clusters on the matrix are labeled usinga standard labeling moiety. Base-specific removal of the last base ofeach chain, interspersed with fluorescence scanning of the array, willreveal the disappearance of fluorescence and hence the identity of thelast base of each chain. When all four labeled end bases have beenremoved, the polymers attached to the matrix are relabeled and theprocess is repeated, working successively on the DNA chains.

Alternatively, if the label allows distinguishing between differentmonomers, simpler degradation processes may be employed. A single scanstep can distinguish between all four possible terminal nucleotides. Thefour separate removal steps are then combined into a single nonspecificterminal nucleotide removal step.

The DNA will usually be attached to the substrate by the 3′ or 5′terminus depending on the scheme of labeling and cleavage. Because thereare well-known 5′ labeling methods, see, e.g., Gait (1984)Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford, thisdiscussion will assume the 3′ end is attached to the substrate with the5′ end free.

Step 1: All the 5′-end bases are labeled with 5′ specific chemistry,e.g., 5′ amino linkage to FITC, Nelson et al. (1989) Nucl. Acids Res.17:7179-7186, which is hereby incorporated herein by reference.

Step 2: Scan the matrix to obtain the background level.

Step 3: Optional: Cap all of the labeling failures, e.g., polymers whoseends were not labeled.

Step 4: The terminal A's are removed with end-base, A-specific reagents(such a reagent may be chemical or biological). One example is a 5′fluorescein-dAMP-specific exonuclease made as a catalytic antibody (seethe description above for a scheme of producing this reagent).

Step 5: Scan the matrix to detect those chains that had terminated in A(these will be reduced in fluorescence compared to the fluorescentlylabeled background).

Step 6: Repeat steps 4 and 5 for each of other three possible basesusing the appropriate fluorescein-base-specific cleavage reagent andscan after removal of each of the C's, the G's, and the T's. Thissuccession of steps will allow the determination of the terminalnucleotide of each positionally defined cluster.

Step 7: Relabel the 5′ terminal nucleotide of all the new end bases thathave been exposed by the earlier rounds of cleavage, and repeat thestepwise removal and scanning processes.

This approach can be extended to protein sequencing using 20 catalyticantibodies (or other amino acid-specific cleavage reagents), eachrecognizing a terminal amino acid and removing that terminal residue.

The process for sequencing may be summarized as follows for enzymaticpolymerization:

1) Target DNA templates (to be sequenced) are attached at positionallydefined locations on the matrix substrate.

2) Fluorescent chain terminators are added to a primer under conditionswhere all polymer chains are terminated after addition of the next basecomplementary to the template.

3) The matrix is scanned to determine which base was added to eachlocation. This step correlates the added base with a position on thematrix.

4) Chains failing to extend (and therefore to terminate) are capped.

5) The fluorophores are removed or deactivated.

6) The terminators are activated for further chain extension, usually byremoval of a blocking group.

Steps 2 through 6 are repeated to obtain the base-by-base sequence ofmany different positionally separated DNA fragments simultaneously.

C. Screening for New Nucleotide Analog/Polymerase Combination.

The use of a functional combination of blocked nucleotide with apolymerase is important in the synthetic embodiment of the presentinvention. It is important to ensure that only a single nucleotide isincorporated at the appropriate step. The following protocol describeshow to screen for a functional combination.

Test 1 (Test for Polymerase Inhibition)

In a reaction volume of 20 μl, mix

-   -   1 μM13mp19 single stranded DNA template    -   2.5 ng standard M13 primer (17-mer: 5′-GTTTTCCCAGTCACGAC-3′)    -   60 mM tris-Cl pH 8.5    -   7.5 mM MgCl₂    -   75 mM NaCl

Templates and primer are annealed by heating to 95° C., then cooling to˜25° C.

Extension components are added:

-   -   50 μM (each) dATP, dCTP, dGTP, TTP;    -   10 μCi P32 dATP;    -   0.01 μM to 1 mM of the putative terminator compound, further        titrations may be desired;    -   20 units AMV reverse transcriptase;    -   water to 20 μl final volume;

The reaction is run at 42° C. for about 30 minutes.

Aliquots are taken at 10, 20, 30 minutes, and samples are TCAprecipitated after the addition of 10 μg tRNA carrier.

The filters are counted for acid-precipitable radioactivity and the massof dATP incorporated is calculated as a function of reaction time.

Control reactions are run in parallel consisting of

-   -   A) no added terminator    -   B) 10 μM and 100 μM

The termination activity of the experimental samples relative to that ofddNTPs is estimated, and a nucleotide is appropriate for further testingif it substantially decreases the number of acid precipitable counts atany time or relative concentration.

Test 2 (Test for Base Specific Termination Activity)

Reactions are run essentially as described by Prober et al. except:

-   -   1. Unlabelled primer is used    -   2. 1 μCi ³²P dATP is included    -   3. No dideoxyNTPs are added to the experimental samples (control        reactions containing ddNTP at the usual concentrations, and no        test terminators are    -   4. The test compound is added at a concentration estimated to        give 1% and 10% inhibition of incorporation as determined by        test # 1.

The reactions are run for 10 min at 42° C. 100 μM dNTPs are added andthe reaction run for an additional 10 min. A portion of the reaction isprepared and run on a sequencing gel in the usual fashion. The laddersobtained with the test compound are compared with those obtained in theddNTP reactions and the fidelity of the termination activity of the testcompound is thereby assessed.

III. APPARATUS

The present invention provides a new use for an apparatus comprising areaction chamber and a scanning apparatus which can scan a substratematerial exposed to the chamber. FIG. 11 illustrates a system and aschematized reaction chamber to which is attached a silicon or glasssubstrate. The system has a detection system 102 as illustrated, in oneembodiment, in FIG. 7. A silicon substrate 104, is attached against andforming a seal to make a reaction chamber 106. Leading into and out ofthe chamber are tubes 108, with valves 110 which control the entry andexit of reagents 112 which are involved in the stepwise reactions. Thechamber is held at a constant temperature by a temperature block 114.

All publications and patent applications are herein incorporated byreference to the same extent as if each individual publication or patentapplication was specifically and individually indicated to beincorporated by reference. The invention now being fully described, itwill be apparent to one of ordinary skill in the art that many changesand modifications can be made thereto without departing from the spiritor scope of the claims.

1. A method for analyzing a target nucleic acid comprising: (a)providing a nucleic acid array, said array comprising at least twodifferent positionally distinct target template nucleic acids; (b)contacting said nucleic acid array with at least one nucleotidecomprising a removable label; (c) detecting the label to determine theaddition of the nucleotide to an oligonucleotide hybridized to one ofthe template nucleic acids; (d) removing the label; and (e) repeatingsteps (b) and (c).
 2. The method of claim 1, wherein the at least twodifferent positionally distinct template nucleic acids are attached todifferent beads.
 3. The method of claim 1, wherein the nucleotidefurther comprises a removable blocking group that blocks addition ofother nucleotides and the method further comprises removing the blockinggroup following determination of addition of the nucleotide.
 4. Themethod of claim 3, wherein the removal of the label and the blockinggroup comprises a reaction that removes both the label and the blockinggroup.
 5. The method of claim 1, wherein the array is contacted withfour differentially labeled nucleotides and the determining stepdetermines which nucleotide is added based on the type of label.
 6. Amethod for analyzing a target nucleic acid comprising: (a) providing anucleic acid array, said array comprising at least two differentpositionally distinct target template nucleic acids; (b) contacting saidnucleic acid array with at least one nucleic acid segment of a definedlength; and (c) determining the addition of the nucleic acid segment toan oligonucleotide hybridized to one of the template nucleic acids. 7.The method of claim 6, wherein the at least two different positionallydistinct template nucleic acids are attached to different beads.
 8. Themethod of claim 6, wherein the nucleic acid segment comprises a labeland determining the addition of the nucleic acid segment comprisesdetection of the label.
 9. The method of claim 8, wherein the label isremovable.
 10. The method of claim 9, further comprising removing thelabel following determination of the addition of the nucleic acidsegment.
 11. The method of claim 8, wherein the nucleic acid segmentfurther comprises a blocking group that blocks addition of nucleotides.12. The method of claim 11, wherein each of the labels and the blockinggroups are removable.
 13. The method of claim 12, further comprisingremoving the label and the blocking group following determination ofaddition of the nucleic acid segment.
 14. The method of claim 13,wherein the removal of the label and the blocking group comprises areaction that removes both the label and the blocking group.
 15. Themethod of claim 13, further comprising repeating steps (b) and (c)following the removal of the label and the blocking group.
 16. Themethod of claim 8, wherein the array is contacted with differentiallylabeled nucleotides and the determining step determines which nucleicacid segment is added based on the type of label.